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DETAILED ACTION 
Response to Amendment 

1 . In response to the office action from 7/29/2004, the appHcant has submitted a request for 
continued examination, filed 1/29/2005, arguing to traverse the art rejection based on the 
limitation regarding the synchronization of caption data that is converted from an audio signal of 
an AV signal with a video signal (Amendment, Page 8). Applicant's arguments have been fiilly 
considered, however the previous rejection is maintained due to the reasons hsted below in the 
response to arguments. 

2. In the below rejection, Alshawi has been corrected to indicate the proper patent number 
(5,815,196). 

Response to Arguments 

3. Applicant's arguments have been fixlly considered but they are not persuasive for the 
following reasons: 

With respect to Claims 1, 9, and 17, the appHcant argues that Alshawi (U.S. Patent: 
5,815,196) in view of Throckmorton et al (U.S. Patent: 5,818,441) fails to teach the 
synchronization of caption data that is converted from an audio signal of an AV signal with a 
video signal (Amendment, Pages 7-8), however the examiner notes that Throckmorton teaches a 
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synchronizer that utilizes time codes from a primary data stream for the alignment of associated 
data (prior office action, pages 2-3). 

Applicant's arguments that Alshawi teaches away from the claimed invention 
(Amendment, Page 7) fail to comply with 37 CFR 1 . 1 1 1(b) because they amount to a general 
allegation that the claims define a patentable invention without specifically pointing out how the 
language of the claims patentably distinguishes them from the references. 

The applicant points out that Throckmorton teaches that associated data refers to a stream 
of data that is generated separately from the primary data (Amendment, Pages 7-8), however this 
statement does not teach away from the presently claimed invention as asserted by the applicant. 
In the claimed invention there is no indication that a singular process is used to generate an AV 
signal and an associated caption signal According to the claimed invention the AV signal would 
also be generated first and separately and, at a later time, the caption signal would be generated 
using a speech-to-text generation means. Thus, since the AV signal and caption signal in the 
claimed invention are not generated simultaneously and utilize different data generation 
processes, Throckmorton does not teach away from the claimed invention. 

With respect to the apphcant's arguments that Alshawi in view of Throckmorton fail to 
teach the synchronization of caption data that is converted from an audio signal of an AV signal 
with a video signal (Amendment, Pages 7-8), however, Throckmorton teaches the 
synchronization of primary data with associated data utilizing time codes (CoL 4, Lines 52-65, 
and Fig. 2, Element 20), Throckmorton fixrther discloses that the associated data can be in the 
form of a caption (Col. 6, Lines 54-63), while the primary data can be from an AV signal 
(television, videotape signal. Col. 3, Lines 36-54). Alshawi teaches the generation of said 
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captions from an AV signal utilizing speech recognition, as noted in the prior office action 
(Pages 2-3). Therefore, since Alshawi teaches a process of automatically generating captions 
and Throckmorton teaches synchronizing caption data with an AV signal utilizing time codes, 
Claims 1, 9, and 17 remain rejected. 

Claims 2, 5-8, 10, 13-16, 18, and 21-24 are argued as further limiting their parent claims 
(Amendment, Pages 8-9). Thus, since the rejection of the independent claims is maintained, 
these dependent claims also remain rejected. 

The appHcant's arguments with respect to Claims 3, 11, and 19 have been fully 
considered, but are moot in view of the new grounds of rejection in view of Rumreich et al (U.S, 
Patent: 5,929,927). 

Claim Rejections - 35 USC §103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

5. Claims 1, 2, 6, 8, 7, 9, 10, 14, 15, 17, 18, and 22-24 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Alshawi (U.S. Patent: 5,815,196) in view of Throckmorton et al 
(as. Patent: 5,818,441). 

Regarding Claims 1, 9, and 17, Alshawi discloses a method/computer readable medium 
having stored instructions with at least one processor for providing real-time subtitles 
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[captioning] in an AV signal. The disclosure includes the automatic conversion of an audio 
[including speech] signal in the AV signal Alshawi describes in Fig 1., a video-based 
communications device (5,8). The device provides segmentation of an AV signal (16) and the 
further processing of the audio [speech] portion of the signal to provide continuous speech-to- 
subtitles [speech-to-text] translation (19,21,22) that has the ability to overlay and display text 
subtitles onto AV signal in real-time [captioning] (26). Alshawi does not disclose synchronizing 
the caption data with one or more cues in the AV signal. However, Throckmorton et al. teach a 
data svnchronizing sub-system whose function is to synchronize the primary data stream 
generated by sub-system 10 with specific associated data. The input to data svnchronizing sub- 
system 20 is scene information from the primary data stream in the form of timecodes and time 
durations [cues] and data from associated data generator sub-system 16. It creates a so called 
script for the delivery and display of associated data at specific points in time. The ability to 
synchronize the associated data with the primary has many benefits including helping the hearing 
impaired viewers to better understand AV content and providing relevant data such as 
synchronized captions that pertain to a television broadcast in real-time (Throckmorton, 
providing real-time associated data, Col 7, Lines 59-67). 

Therefore/ it would have been obvious to one of ordinary skill at the time of the invention 
to modify Alshawi with the synchronization of the caption data with one or more cues in the AV 
signal as taught by Throckmorton et al. since it would have enhanced the viewing experience of 
the hearing impaired and provided relevant data such as synchronized captions that pertain to a 
television broadcast in real-time. 
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Regarding Claims 2, 10, and 18, Alshawi discloses a method and apparatus that captures 
an AV signal and further provides the audio [speech] portion of the signal for conversion to text. 
Alshawi describes a videophone receiver that has an input signal that comprises a camera that 
represents the visual component of the communication and a microphone that represents the 
audio component of the signal that have been encoded. (Col 2, 33- 40), In addition, Alshawi 
describes an audio/video decoder that accepts an AV input and separates the signal into two 
entities, video signal and audio signal (Col 2, 51 - 55). 

Regarding Claims 6, 14, and 22, Alshawi discloses a display that shows at least the 
video and text [caption] data. Alshawi describes simultaneously displaying the sending party's 
video overlaid with real-time subtitles [captions] that translates the sender's speech (Col 3, 26 - 
29). 

Regarding Claims 7 and 23, Alshawi discloses an integrated method and apparatus for 
providing real-time subtitles [captioning] in an AV signal The disclosure includes the automatic 
conversion of an audio (including speech) signal in the AV signal to text (caption) data and 
associating the audio and text (caption) data at a time that corresponds to the video signal 
wherein the signal combination processing system synchronizes the caption data with one or 
more cues in the AV signal. Alshawi describes in Fig 1., a video-based communications device 
(5,8). The device provides segmentation of an AV signal (16) and the further processing of the 
audio [speech] portion of the signal to provide continuous speech-to-subtitles (speech-to-text) 
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translation (19,21,22) that has the ability to overlay and display text subtitles onto AV signal in 
real-time [captioning] (26). 

Regarding Claims 8, 15, and 24, Alshawi discloses a method and apparatus for 
translating speech and captions into a second language. Alshawi describes an embodiment where 
the textual signal is translated into a target language that is then overlaid onto the video signal as 
real-time subtitles [captions] (Col 3, 46). 

6. Claims 3, 11, and 19 are rejected under 35 U.S. C. 103(a) as being unpatentable over 
Alshawi in view of Throckmorton et al, as applied to claims 1, 9, and 17 above, and further in 
view of Rumreich et al (U.S, Patent: 5,929,927). 

Regarding Claims 3, 11 and 19, the combination of Alshawi and Throckmorton et al. 
discloses an integrated method and apparatus for providing real-time subtitles (captioning) in an 
AV signal. The disclosure includes the automatic conversion of an audio [including speech] 
signal in the AV signal to text (caption) data and associating the audio and text (caption) data at a 
time that corresponds to the video signal. Alshawi describes in Fig 1,, a video-based 
communications device (5,8). The device provides segmentation of an AV signal (16) and the 
further processing of the audio (speech) portion of the signal to provide continuous speech-to- 
subtitles (speech-to-text) translation (19,21,22) that has the ability to overlay and display text 
subtitles onto AV signal in real-time [captioning] (26). 
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Alshawi in view of Throckmorton does not specifically suggest a method of converting 
the audio portion of the signal to text data that checks whether the amount of caption data is great 
than a threshold amount, however Rumreich teaches the processing of additional captioning data 
if a counter indicates that a caption buffer threshold has been exceeded (CoL 10, Lines 16-34). 
Rumreich also teaches the use of a timing means (Col. 6, Lines 24-37). 

Alshawi, Throckmorton, and Rumreich are analogous art because they are from a similar 
field of endeavor in caption processing. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Alshawi in view of 
Throckmorton with a means for providing an indication for the need of additional caption data if 
a caption buffer exceeds a threshold as taught by Rumreich to ensure that a caption display keeps 
up with audio content (Rumreich, Col 10, Lines 24-27) by providing the caption creation means 
(speech-to-text) taught by Alshawi in view of Throckmorton with an indication of the need for 
additional captioning data. 

7. Claims 5, 13, and 21 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Alshawi in view of Throckmorton et al, as apphed to claims 1, 9, and 17 above, and further in 
view of Angell et al (U.S. Patent: 6,513,003). 

Regarding Claims 5, 13, and 21, Alshawi does not show the embedding (encoding) of 
the text [caption] data within the AV signal. Instead, a subtitle generator (24, Fig. 1) is used to 
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overlay text data onto the AV signal. However, Angell et al. teach the embedding (encoding) of 
the text [caption] data within the AV signal (Fig 1(108, 140); Col 4, Line 55 - Col 5, Line 17). 
Embedding of a text signal by synchronizing and encoding the text with the audio video signal 
allows the composite signal to be played on a conventional display device at any location. 

Therefore, it would have been obvious to one of ordinary skill in the art at the time of 
invention to modify the combination of Alshawi and Throckmorton by embedding (associating) 
text data with AV signal data using a database as taught by Angell et al. to improve on the real 
time dissemination of the composite audio video signal with closed caption. 

8. Claim 16 is rejected under 35 U.S.C. 103(a) as being unpatentable over Alshawi in view 
of Throckmorton et al, and further in view of Kirkland et al (US, Patent: 5,900,908). 

Regarding Claim 16, the combination of Alshawi and Throckmorton et al. discloses an 
integrated method and apparatus for providing real-time subtitles [captioning] in an AV signal. 
The disclosure includes the automatic conversion of an audio [including speech] signal in the AV 
signal to text (caption) data and associating the audio and text (caption) data at a time that 
corresponds to the video signal. The combination of Alshawi and Throckmorton et al. do not 
show portability and the utilization of the device in the classroom. However, Kirkland teaches a 
method of providing encoding caption data into the program signal. The apparatus receives a 
television signal with various description data including caption data (Col 9, 15). The device 
itself is a set-top box that can be co-located with a television (portable) and which can be used 
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for live performances, classrooms and other types of presentations (Col 3,29 and Col 10, 60). 
Devices with such features help the handicap or hearing impaired by providing portable text or 
audio services. 

Therefore, it would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify the combination of Alshawi and Throckmorton et al. by making it 
portable for use in such venues as a classroom taught by Kirkland in order to improve on the 
capability of the captioning system for use in the classroom. 

Conclusion 

9. The prior art made of record and not rehed upon is considered pertinent to applicant's 
disclosure: 

Wactlar et al (US. Patent: 5,835,667)- utilizes time stamps in video and transcript text 
(generated by speech recognition) to synchronize captions. 

Funaki et al (U.S. Patent: 5,983,035)- teaches a means for indicating the need for 
additional caption processing if a time period expires or caption data exceeds a buffer threshold. 

Ortega et al (US. Patent: 6,332,122)- teaches a means for synchronizing caption data 
with speech. 

10. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to James S. Wozniak whose telephone number is (703) 305-8669 
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and email is Jaines.Wozniak@uspto.gov. The examiner can normally be reached on Mondays- 
Fridays, 8:30-4:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached at (703) 305-4827. The fax/phone number for the 
Technology Center 2600 where this application is assigned is (703) 872-9306. 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the technology center receptionist whose telephone number is (703) 306- 



0377. 



James S. Wozniak 
3/2/2005 





